Indexing Issues in Supporting Similarity Searching

نویسنده

  • Hanan Samet
چکیده

Indexing issues that arise in the support of similarity searching are presented. This includes a discussion of the curse of dimensionality, as well as multidimensional indexing, distance-based indexing, dimension reduction, and embedding methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conceptual Search Based on Semantic Relatedness

Traditional search engines based on syntactic search are unable to solve key issues like synonymy and polysemy. Solving these issues leads to the invention of the semantic web. The semantic search engines indeed overcome these issues. Nowadays the most important part of the data remains unstructured documents. It is consequently very time consuming to annotate such big data. Concept based retri...

متن کامل

Search Efficiency in Indexing Structures for Similarity Searching

Similarity searching finds application in a wide variety of domains including multilingual databases, computational biology, pattern recognition and text retrieval. Similarity is measured in terms of a distance function (edit distance) in general metric spaces, which is expensive to compute. Indexing techniques can be used reduce the number of distance computations. We present an analysis of va...

متن کامل

Indexing and Searching Mathematics in Digital Libraries

This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware...

متن کامل

Efficient Document Indexing Using Pivot Tree

We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bagof-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a met...

متن کامل

Grouping and Indexing Color Features for Efficient Image Retrieval

Content-based image retrieval (CBIR) aims at searching image databases for specific images that are similar to a given query image based on matching of features derived from the image content. This paper focuses on a low-dimensional color based indexing technique for achieving efficient and effective retrieval performance. In our approach, the color features are extracted using the mean shift a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004